Skip to content

feat(backend): Lifecycle Manager + Session Manager lane#2

Merged
harshitsinghbhandari merged 25 commits into
mainfrom
feat/lcm-sm-contracts
May 27, 2026
Merged

feat(backend): Lifecycle Manager + Session Manager lane#2
harshitsinghbhandari merged 25 commits into
mainfrom
feat/lcm-sm-contracts

Conversation

@harshitsinghbhandari

@harshitsinghbhandari harshitsinghbhandari commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

This branch is the complete Lifecycle Manager (LCM) + Session Manager (SM) lane of the backend rewrite — the deterministic core that supervises agent sessions and keeps one true status per session. It was built contract-first on a single integration branch; the sub-PRs below merged into this branch, and it now lands on main as one cohesive unit.

Full documentation lives in docs/: start with docs/architecture.md, and see docs/status.md for what's done vs. what's left.

What it delivers

A per-session OBSERVE → DECIDE → ACT loop where OBSERVE lives outside the LCM (observers call in) and the LCM is a synchronous reducer:

  • domain/ — the canonical state model (CanonicalSessionLifecycle, persisted) and DeriveLegacyStatus, the sole producer of the display status (derived on read, never stored).
  • domain/decide/ — the pure, total, deterministic decision core (liveness reconciliation, the PR ladder, the anti-flap detecting quarantine), table-tested to 100%.
  • ports/ — the stabilized boundary: inbound LifecycleManager/SessionManager, outbound LifecycleStore/Notifier/AgentMessenger/Runtime/Agent/Workspace, and the SCMFacts/RuntimeFacts/ActivitySignal DTOs.
  • lifecycle/ — the LCM: the Apply* pipeline (load → decide → diff → persist), per-session serialization, the liveness-vs-activity composition rules, and the reaction table + escalation engine (TickEscalations).
  • session/ — the SM: spawn / kill / restore / cleanup / list, with eager rollback and the worktree-remove safety.

Folded-in PRs (merged into this branch)

Key invariants

  • Persist canonical state; derive the display status on read.
  • One authority for death — only the decide pipeline (via detecting) writes inferred terminal states; the SM's explicit-kill path goes through OnKillRequested.
  • Failed probe ≠ dead; PR facts dominate the soft session states once a PR exists; worktree-remove safety on teardown.

Status

Implemented and tested entirely behind in-memory fakesgofmt / go build / go vet / go test -race green across domain, decide, lifecycle, and session. The SM tests drive the real LCM for spawn/kill round-trips.

Not in this lane (integration phase, tracked in #3): swapping fakes for real adapters (persistence/CDC, SCM poller, runtime/agent/workspace plugins, notifier, API), plus three carried-forward items — the react() out-of-lock dispatch ordering, the unused ExpectedRevision CAS for multi-writer/CDC, and a real implementation of the store's new Seed/Get. See docs/status.md for the full hand-off.

Test plan

  • cd backend && gofmt -l . && go build ./... && go vet ./... && go test -race ./... all green
  • Review the cumulative diff vs. main as one lane (sub-PRs were reviewed against the integration branch)

🤖 Generated with Claude Code

Contract-first boundary for the Lifecycle Manager + Session Manager lane.
Pure shapes only — types, interfaces, and the display-status derivation —
so adil (SCM poller), Tom (persistence), and aditi (API) can review and
build against a stable boundary before any behaviour lands.

domain/
  - CanonicalSessionLifecycle: the only persisted state (session/pr/runtime
    sub-states), with Activity + Detecting sub-states added as decider inputs
    that must survive between observations.
  - DeriveLegacyStatus: the sole producer of the derived display status
    (never persisted), with 11 table tests.
ports/
  - inbound: LifecycleManager (Apply* pipeline, per-session serialised) and
    SessionManager.
  - outbound: LifecycleStore (Tom), Notifier, AgentMessenger, and the
    Runtime/Agent/Workspace plugin ports (co-owned with the agents lane).
  - facts: SCMFacts / RuntimeFacts / ActivitySignal DTOs.
decide/ pure-core signatures + I/O types; bodies stubbed for the next PR.

Folds in four design-review fixes (documented in-code, pending team confirm):
  1. Activity + Detecting persisted so the pure decider has memory across calls.
  2. Per-session serialisation documented; LifecyclePatch.ExpectedVersion
     offers optimistic-locking as an alternative.
  3. LifecyclePatch is a sparse pointer-field merge-patch (+ ClearDetecting).
  4. SCMFacts gains Fetched (failed fetch != "PR closed") and per-comment
     IsBot (bot vs human route to different reactions).

go build / go vet / go test all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the initial contract-first Go packages for the backend “Lifecycle Manager (LCM) + Session Manager (SM)” lane, defining the canonical lifecycle domain model, inbound/outbound ports, and a single pure status-derivation function (with tests) to stabilize boundaries for parallel implementation work.

Changes:

  • Added domain canonical lifecycle/state enums, Session read-model, and DeriveLegacyStatus (plus table-style tests).
  • Added ports package with inbound interfaces (LCM/SM), outbound interfaces (store/notifier/runtime/agent/workspace), and fact DTOs (SCM/runtime/activity/spawn/kill).
  • Added domain/decide package with pure-core I/O shapes and stubbed function bodies for a follow-up PR.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
backend/internal/ports/outbound.go Outbound ports for persistence, notifications, and runtime/agent/workspace plugins
backend/internal/ports/inbound.go Inbound contracts for LifecycleManager and SessionManager
backend/internal/ports/facts.go DTOs (“facts”) crossing the LCM/SM boundary (SCM, runtime, activity, spawn/kill)
backend/internal/domain/status.go Derived display status enum + DeriveLegacyStatus
backend/internal/domain/status_test.go Tests for DeriveLegacyStatus
backend/internal/domain/session.go API read-model Session definition
backend/internal/domain/lifecycle.go Canonical persisted lifecycle model + enums/substates
backend/internal/domain/decide/decide.go Decide-core input/output shapes and tuning constants (stubbed bodies)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/internal/ports/outbound.go Outdated
Comment thread backend/internal/ports/outbound.go Outdated
Comment thread backend/internal/domain/status.go Outdated
Comment thread backend/internal/ports/facts.go Outdated
Comment thread backend/internal/ports/outbound.go Outdated
Comment thread backend/internal/ports/outbound.go
Comment thread backend/internal/ports/facts.go
Comment thread backend/internal/ports/outbound.go Outdated
harshitsinghbhandari and others added 2 commits May 26, 2026 20:55
Review feedback on PR #2:

- Add CanonicalSessionLifecycle.Revision (monotonic write counter) distinct
  from the schema Version; LifecyclePatch.ExpectedVersion -> ExpectedRevision
  now compares it, so optimistic locking actually works.
- LifecycleStore.List returns []domain.SessionRecord (persistence shape, no
  derived status); add SessionRecord and make Session embed it. Keeps the
  Session Manager the single producer of the derived display status.
- SpawnOutcome.RuntimeHandle is now the structured ports.RuntimeHandle, not a
  string, so Destroy/SendMessage get the handle without ad-hoc encoding.
- Agent.IsProcessRunning -> ProbeProcess returning ProcessProbe (liveness),
  not domain.ActivityState; the name no longer implies a boolean.
- Document LifecyclePatch Detecting vs ClearDetecting precedence (clear wins).
- Correct the DeriveLegacyStatus doc: hard session states outrank PR facts;
  "PR dominates" applies only to the soft idle/working states. Implementation
  was already correct (matches canonical AO); only the comment overstated it.
- Replace personal-name attributions in package/interface comments with
  role-based terms (SCM poller / persistence adapter / API layer).

go build / go vet / go test all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- go.yml: gofmt check, build, vet, and race-enabled tests for backend/,
  triggered on backend changes and pushes to main.
- gitleaks.yml: secret scanning on PRs and main using gitleaks-action v1
  (license-free; v2 requires GITLEAKS_LICENSE for org repos).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tests

Replace the stubbed deciders in domain/decide with real, total,
side-effect-free implementations:

- ResolveProbeDecision: kill-intent short-circuits to terminal; failed
  probes and probe disagreement route to detecting; only runtime-dead +
  process-dead + no-recent-activity concludes killed.
- ResolveOpenPRDecision: the PR pipeline ladder
  (ci_failing > changes_requested > approved+mergeable > approved >
  review_pending > idle-beyond > open).
- ResolveTerminalPRStateDecision: merged -> idle/merged_waiting_decision,
  closed -> idle.
- CreateDetectingDecision: anti-flap quarantine — unchanged-evidence
  counter with StartedAt preserved across the episode so the duration cap
  is a real wall-clock safety net; escalates to stuck at 3 ticks or 5m.
- HashEvidence: strips timestamps/epochs and collapses whitespace before
  hashing so restamped-but-unchanged signals compare equal.

Table tests cover every branch (100% statement coverage), including a
consistency check that the open-PR ladder's display Status matches
DeriveLegacyStatus over the canonical state it emits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshitsinghbhandari and others added 8 commits May 27, 2026 00:06
…ency test

Address PR review (aa-1):
- ResolveOpenPRDecision now keys MERGEABLE on Mergeable alone (checked
  before Approved). Mergeability is the authoritative merge gate, so a PR
  on a no-required-review repo no longer falls through to PR_OPEN. The
  approved+mergeable and approved-only cases are unchanged.
- Broaden the derive-consistency test to cover the probe and terminal
  deciders too, not just the open-PR ladder.
- Document the HashEvidence epoch-stripping regex's breadth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ting helper

Address PR review (aa-1):
- Move the decider input/output type definitions (LifecycleDecision,
  ProbeInput, ProcessLiveness, OpenPRInput, DetectingInput) out of the
  execution file into a dedicated types.go, leaving decide.go for behaviour.
- Expand the detecting() helper doc to spell out that it adapts a probe
  verdict into the shared CreateDetectingDecision anti-flap path so each
  probe branch doesn't re-implement the quarantine counter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify the timestamp-stripping block flagged in review: spell out what
each of the three regexes matches (full ISO/RFC3339 datetime, bare
time-of-day, bare unix epoch) and why order matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… contract

Address Copilot review:
- ResolveOpenPRDecision now sets a stable, timestamp-free Evidence summary
  "<condition> #<num> <url>" for every ladder outcome, consuming the
  previously-unused OpenPRInput.Number/URL identity inputs and making PR
  decisions traceable in logs. Covered by TestResolveOpenPRDecisionEvidence.
- Document LifecycleDecision's zero-value contract: an empty PRState/PRReason
  means "this decider does not address PR — leave unchanged", not PRNone. The
  LCM must map empty PR fields to a nil LifecyclePatch.PR; writing PRNone on a
  probe tick would clobber a live PR. (Pointers were considered but the empty
  sentinel is distinguishable from every valid state and consistent with the
  codebase's value-enum style; LifecyclePatch already owns nil-means-unchanged.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(decide): pure DECIDE core + exhaustive truth-table tests
Implements ports.LifecycleManager as a synchronous observe->decide->persist
reducer. Every entrypoint runs the shared pipeline under a per-session lock:
load canonical -> run the matching pure decider -> diff into a sparse
merge-patch -> persist. Never polls, never writes the display status.

- ApplyRuntimeObservation -> probe decider; always writes the runtime axis.
- ApplySCMObservation -> open-PR / terminal-PR deciders (failed fetch is a
  no-op: failed probe != "no PR"). Open PRs write only the PR axis.
- ApplyActivitySignal -> updates the activity axis + maps onto the session
  axis; only valid-confidence signals are authoritative.
- OnSpawnCompleted -> runtime alive + handles to metadata; session stays
  not_started (display: spawning).
- OnKillRequested -> SM's explicit terminal-write authority.
- TickEscalations -> no-op stub (reaction/escalation engine is split B).

Composition rule (#1): liveness owns the runtime + death axis; activity owns
the working/idle/waiting axis. A healthy probe verdict writes the session axis
only to recover a liveness-owned state, so it never clobbers an activity-owned
needs_input/blocked. Activity is the mirror: it stays off the death axis.

Detecting clear (#3): a non-detecting probe verdict clears stale detecting
memory so the next probe reads no phantom Prior.

Built/tested against in-memory fakes (LifecycleStore with full merge-patch +
ExpectedRevision, recording Notifier/AgentMessenger). Per-session
serialisation verified under -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- shouldWriteSessionRuntime: never resurrect a terminal session; an
  observation may refresh the runtime axis but must touch neither the
  session axis nor the detecting memory (gated in ApplyRuntimeObservation).
- OnSpawnCompleted: error on an unseeded session instead of fabricating a
  partial record (SM must seed first — a missing seed is a contract violation).
- OnKillRequested: no-op on an unknown/already-gone session (benign race)
  instead of fabricating a terminal record.
- keyedMutex: reference-count entries and evict on last release so the lock
  map stays bounded in a long-running daemon.
- runtimeSubstateFromFacts: map RuntimeProbeIndeterminate to RuntimeUnknown
  with a neutral reason, distinct from the probe_error of a failed probe.

Adds tests for terminal non-resurrection, unseeded spawn-completed error,
and unknown-session kill no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshitsinghbhandari and others added 3 commits May 27, 2026 01:49
Address Harshit's PR #5 review (approve w/ design confirms + polish):

- #1 (design decision): a valid activity signal is proof of life, so it now
  resolves a detecting session — writes the activity-mapped session state and
  clears the quarantine memory. Scoped to detecting only; a liveness-escalated
  stuck stays the probe pipeline's to resolve. Terminal still never reopens.
- #2: document why a merged/closed PR parks the session axis even over an
  activity-owned needs_input/blocked (a merge is a milestone), unlike the
  open-PR path that defers to activity.
- #3: map plain idle activity to a neutral session reason instead of the
  misleading research_complete (kept for ready, which implies completion).
- #6: cover all three kill kinds (manual/cleanup/error), the open-PR review
  branches (changes_requested/mergeable/review_pending), and the neutral idle
  reason. Coverage 86.5% -> 88.6%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(lifecycle): LCM Apply* pipeline (split A)
…t B)

Add the ACT half of the LCM: map persisted status transitions to reactions
(send-to-agent / notify / auto-merge) and drive escalation.

- reactions.go: the §4.2 default reaction table, reactionEventFor (mirrors
  DeriveLegacyStatus for the ACT layer), in-memory per-(session,reaction)
  escalation trackers, the react() dispatch chokepoint, and a real
  TickEscalations for duration-based escalations the synchronous LCM can't
  wake itself for. auto-merge action exists but is off by default;
  bugbot-comments/merge-conflicts are configured but dormant (no decide-core
  producer yet).
- manager.go: mutate now returns a transition; each Apply* path fires the
  mapped reaction after persist via the single synchronous react() seam.
  OnKillRequested intentionally does not react (explicit kill != inferred
  event). Split-A load->decide->diff->persist behavior is unchanged.

ci-failed budget is persistent across fail->pending->fail oscillation;
non-persistent trackers reset when the status leaves the triggering state.
Escalation silences further auto-dispatch until the condition clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshitsinghbhandari and others added 6 commits May 27, 2026 02:26
 review)

Address review finding #1: the persistent ci-failed tracker leaked and could
stale-silence a future regression. It was only cleared when leaving the
ci-failed reaction AND incidentOver held at that moment — so a recovery to
another open-PR state (ci-failed -> approved -> merged) never cleared it.

- react() now clears ALL of a session's trackers when the state REACHED is
  incident-over (PR resolved / session terminal) OR a genuine recovery
  (approved/mergeable, which the open-PR ladder guarantees means CI is no
  longer failing). Keyed on the state reached, not the one left, since the
  recovery transition is typically review_pending->approved (empty beforeKey).
- Persistent ci-failed still survives the ambiguous review_pending limbo, so
  fail->pending->fail keeps one shared budget (§4.2).
- Document the out-of-lock react() dispatch caveat for the daemon integration
  step (review #2) and the intentionally-skipped agent-stuck 10m threshold.

Tests: re-arm after a genuine recovery (regression re-nudges, not silenced);
all session trackers cleared once the incident is over.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…escalation (Copilot review)

- OnKillRequested now clears the session's escalation trackers after a
  successful kill, so a later duration-based TickEscalations can't emit
  reaction.escalated for a dead session (dispatch is still skipped).
- sendToAgent rolls back the attempt (and firstAttemptAt when it set it) on a
  messenger.Send error, so undelivered messages don't march a reaction toward
  escalation — honoring "send failures retry next tick" (§4.3).
- Duration escalation now uses an inclusive boundary (>=) in both shouldEscalate
  and TickEscalations, so a 30m reaction escalates at exactly 30m instead of
  waiting for the next tick.

Tests: kill clears trackers + no post-kill escalation; repeated failed delivery
never escalates; duration escalation fires at exactly escalateAfter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(lifecycle): ACT layer — reaction table + escalation engine (split B)
…store/cleanup)

Implements ports.SessionManager against fakes for the outbound ports. The SM is
the explicit-mutation half of the lane: it drives Runtime/Agent/Workspace, seeds
the initial lifecycle, and routes outcomes to the LCM (OnSpawnCompleted /
OnKillRequested). It never derives observed state and is the single producer of
the derived display status (attached on read, never persisted).

- Spawn: Workspace.Create -> Runtime.Create (AO_* identity env) -> Seed ->
  OnSpawnCompleted, with eager rollback of completed steps on failure.
- Kill: OnKillRequested first -> Runtime.Destroy -> Workspace.Destroy, honoring
  the worktree-remove safety (refusal surfaced, never forced).
- List/Get: derive status via DeriveLegacyStatus. Send: via AgentMessenger.
  Restore: re-seed (reopen) + relaunch via GetRestoreCommand. Cleanup: reclaim
  terminal sessions, skip worktrees holding uncommitted work.

Store-contract additions (co-owned with Tom's persistence layer, flagged for
review): LifecycleStore.Seed (explicit create-with-identity; OnSpawnCompleted
requires a seeded record) and LifecycleStore.Get (single record-with-identity
read; Load is lifecycle-only). Lifecycle test fake updated to satisfy both.

Tests route through the real LCM Manager (wrapped to record call order).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntime on post-create failure (PR #7 review)

- Restore now fails early with a clear error if MetaAgentSessionID is missing,
  rather than emitting an ambiguous "resume nothing" launch command (no stored
  prompt means a fresh-launch fallback isn't possible).
- On a post-runtime-create failure (reopen patch or OnSpawnCompleted), best-effort
  destroy the newly created runtime (never the workspace, which holds prior work)
  so we don't strand a live process while parking the session terminal.
- Added a test for Restore with a missing agent session id: errors early, touches
  no workspace/runtime, leaves the session terminal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e rollback (review follow-ups)

Adds an injectable OnSpawnCompleted failure to the recording LCM and two tests:
- Spawn: when OnSpawnCompleted fails, the seeded record is parked terminal/errored
  (via OnKillRequested(KillError)) and runtime+workspace are torn down.
- Restore: when OnSpawnCompleted fails post-create, the new runtime is destroyed
  while the workspace is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(session): Session Manager — spawn/kill/list/get/send/restore/cleanup
Add docs/ for newcomers: an index, an architecture deep-dive (the
OBSERVE→DECIDE→ACT loop, the canonical state model, the package layout, every
component, and the load-bearing invariants), and a status/roadmap (what's done
PR-by-PR, what's left, the integration to-dos + carried-forward items, the open
cross-lane contract questions, and where to plug in). Link them from the README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshitsinghbhandari harshitsinghbhandari changed the title feat(backend): LCM + Session Manager contract package feat(backend): Lifecycle Manager + Session Manager lane May 27, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

Comment thread backend/internal/session/manager.go
Comment thread backend/internal/session/manager.go
Comment thread backend/internal/ports/inbound.go Outdated
Comment thread backend/internal/lifecycle/reactions.go
Address PR #2 Copilot review comments on the merged LCM+SM lane:

- session: validate runtime handle + workspace path before Kill/Cleanup
  teardown; refuse (ErrIncompleteTeardownMetadata) or skip rather than
  hand empty args to a real adapter's Destroy (unsafe delete).
- session: reject Restore unless the session is terminal
  (ErrNotRestorable) so a live session can't spawn a duplicate
  runtime/workspace.
- ports: document SpawnConfig.OpenTerminal as reserved/not yet honored.
- lifecycle: remove the unread reactionConfig.auto field; note
  approved-and-green is notify-only (human decides to merge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(session): harden teardown/restore safety + drop dead reaction flag
@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator Author

All 4 review comments addressed in #8 (f03c7c8), now merged into this branch — resolving the threads. Verified on the current tip (CI build-test + scan green; local gofmt/build/vet/test -race green):

  1. Kill/Cleanup teardownErrIncompleteTeardownMetadata: required handles (workspace path, runtime id/name) are validated before recording intent or calling any adapter Destroy, so an empty-arg (unsafe) delete can't happen on a partial/orphaned record.
  2. RestoreErrNotRestorable: Restore now rejects a non-terminal session, preventing a duplicate runtime/workspace for a live id.
  3. SpawnConfig.OpenTerminal — documented as reserved / not yet honored by Spawn (planned terminal-lane feature), so callers aren't misled.
  4. reactionConfig.auto — removed the dead/derivable flag (the only auto:false entry is a notify, which is the human-decision path; auto-merge stays off).

@harshitsinghbhandari harshitsinghbhandari merged commit cdfd97c into main May 27, 2026
2 checks passed
neversettle17-101 added a commit that referenced this pull request Jun 12, 2026
…viewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vaibhaav-Tiwari pushed a commit that referenced this pull request Jun 14, 2026
* feat(review): configurable AO code review backend (V1)

Add per-project configurable code review of a worker's PR. A reviewer
agent runs one-shot over the worker's own worktree and posts its result
to the PR; the worker picks the feedback up through the existing SCM
observer review-nudge path.

- domain: ProjectConfig.reviewers (+ default reviewer harness), Review /
  ReviewRun types and verdict/status vocab.
- storage: review + review_run tables (0011), sqlc queries, store methods.
- service/review: rewrite the in-memory stub as a persisted ReviewService
  (Trigger/Submit/List) with a reviewer Runner over agent resolver +
  runtime; ports.PRReviewPoster implemented on the GitHub adapter.
- http: session-scoped routes POST /sessions/{id}/reviews/trigger,
  POST .../submit, GET .../reviews; regenerated OpenAPI + TS types.
- cli: ao review trigger|submit|list.
- frontend: adapt ReviewDashboard to the per-worker reviews API.

Closes #192

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): address review — drop submit/poster/CLI, default reviewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): restore ao review submit (records verdict+body in AO)

Per maintainer request, bring back `ao review submit`. AO records the
reviewer's verdict and body on the review_run and marks the pass complete;
it does not post to GitHub — the reviewer agent posts its review to the PR
itself.

- storage: add review_run.body (0011), persist via Insert/UpdateReviewRunResult.
- service: restore Submit (no SCM poster) storing verdict + body.
- http: restore POST /sessions/{id}/reviews/submit + SubmitReviewInput.
- cli: ao review submit [worker] --verdict --body (worker from arg/--session/$AO_REVIEW_WORKER).
- runner: reviewer prompt instructs posting to GitHub and recording via ao review submit.

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): move reviewer runner to its own package; sharpen prompt

Per PR #197 review:
- Move the concrete reviewer runner out of the service layer into a new
  internal/review_runner package (package reviewrunner), beside other
  orchestration packages like session_manager. The service keeps only the
  Runner interface + RunSpec it depends on; the agent-resolver + runtime
  launch flow lives in review_runner.
- Sharpen the reviewer prompt: tell the agent to diff against the PR base,
  focus on high-confidence findings, post via `gh pr review`, and record
  the result with `ao review submit`; review-only (no commits/edits).
- Add unit tests for the runner.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): simplify review_run schema; provider-agnostic reviewer prompt

Per PR #197 review:
- review_run: status default 'running' (drop 'pending'), drop CHECK
  constraints on status/verdict, drop the updated_at column and the
  session/iteration index. Propagated through queries, domain, store,
  service, and tests.
- Reviewer prompt no longer hardcodes GitHub/gh commands — it instructs the
  agent to use whatever review tooling the provider offers, keeping the
  flow extensible across SCM providers.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): launch reviewer before persisting the run

Trigger now spawns the reviewer agent first and then writes the review_run
with a status derived from the launch outcome (running on success, failed
if it never started), instead of inserting a running row and correcting it
to failed afterwards.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): pluggable reviewer registry distinct from worker harnesses

Reviewers are now their own pluggable adapter set, separate from the worker
agent registry — adding a reviewer (claude-code today, greptile tomorrow) is
a one-line registration that does not widen the worker harness vocabulary,
and a worker harness does not automatically become a valid reviewer.

- domain.ReviewerHarness: a distinct vocabulary (AllReviewerHarnesses) with
  its own IsKnown; ReviewerConfig/Review/ReviewRun use it. ResolveReviewerHarness
  reuses the worker harness only when it is itself a supported reviewer, else
  falls back to claude-code.
- ports.Reviewer: a reviewer-specific contract (ReviewCommand → argv + env)
  that models one-shot / non-prompt CLIs natively instead of forcing every
  reviewer through the worker's interactive GetLaunchCommand(Prompt:...).
- internal/adapters/reviewer: a separate registry + resolver (mirrors the
  worker agent registry) with the claude-code reviewer adapter, which owns the
  review prompt and reuses the worker claude-code launch construction.
- review_runner resolves via the reviewer registry (not the worker
  AgentResolver) and merges AO_REVIEW_WORKER into the adapter's env.
- daemon wires the reviewer resolver. Registry/domain parity is test-enforced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(review): cover run-scoped reviewer submit

* fix(api): update generated review submit schema

* refactor(review): split core engine (internal/review) from API service

Move the review orchestration (Trigger/Submit/List, run-id generation,
deps, RunSpec/Runner, sentinels) into a transport-independent core package
internal/review (Engine). internal/service/review is now a thin API-flow
boundary: the controller-facing Manager interface + a Service that delegates
to the engine + error re-exports.

This keeps the service layer to API concerns and lets the same engine back a
future in-process CLI trigger without going through HTTP. review_runner now
depends on the core package; daemon builds the engine and wraps it in the
service. No API/schema changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): commit-aware trigger, reviewer handle for UI, no env vars

Reworks the review trigger lifecycle and drops env-based coupling:

- review_run gains target_sha (the reviewed commit) and drops iteration.
  A repeat trigger for the same PR head short-circuits to the existing run.
- review gains reviewer_handle_id: the live reviewer pane's runtime handle,
  reused across passes and exposed in the reviews API so the UI can attach
  its terminal over /mux.
- Trigger flow: if a live reviewer pane exists and a new commit arrived,
  message it to re-review; otherwise spawn a fresh reviewer. The run is
  recorded only after the reviewer is launched.
- No environment variables: the reviewer adapter embeds the explicit
  `ao review submit --session <w> --run <id>` command in the spawn prompt
  and the re-review message. CLI submit requires --run/--session (no env
  fallbacks).
- Merge review_runner into internal/review as a Launcher (spawn/notify/alive).
- Trigger returns 201 for a new pass, 200 when reusing an existing run.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): author the reviewer prompt centrally, not in the adapter

Mirror the worker model (session_manager builds the prompt; adapters just
place it via LaunchConfig.Prompt). The reviewer prompt now lives in
internal/review/prompt.go and is passed through ports.ReviewInvocation.Prompt;
the claude-code reviewer adapter just feeds inv.Prompt to its launch command
and returns it as the re-review message. One-shot CLI reviewers may ignore it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): split reviewer prompt into system+task, mirroring buildSpawnTexts

Mirror session_manager.buildSpawnTexts for the reviewer: a standing role goes
in the system prompt, the per-pass task (PR/commit + exact `ao review submit`
command) goes in the user prompt. internal/review/prompt.go now returns
(prompt, systemPrompt); both flow through ports.ReviewInvocation and the
claude-code adapter places them via LaunchConfig{Prompt, SystemPrompt}. The
re-review message reuses the per-pass prompt (role already established in the
running pane).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Vaibhaav <user@example.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants